Skip to content

ggml-virtgpu: make the code thread safe#19204

Merged
taronaeo merged 12 commits intoggml-org:masterfrom
kpouget:leaks
Feb 4, 2026
Merged

ggml-virtgpu: make the code thread safe#19204
taronaeo merged 12 commits intoggml-org:masterfrom
kpouget:leaks

Conversation

@kpouget
Copy link
Contributor

@kpouget kpouget commented Jan 30, 2026

This PR improves the code of the ggml-virtgpu backend to make it thread safe, by using mutex for accessing the host<>guest shared memory buffers, and by pre-caching, during the initialization, the constant values queried from the backend.

The unused buffer_type_is_host method is also deprecated.

@github-actions github-actions bot added python python script changes ggml changes relating to the ggml tensor library for machine learning labels Jan 30, 2026
Copy link
Collaborator

@taronaeo taronaeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that some of the GGML_ABORT statements do not include __func__. Would it be better to include them for debugging and error tracing in the future? :)

@kpouget
Copy link
Contributor Author

kpouget commented Jan 30, 2026

thanks for the feedback, I followed them.

I see that some of the GGML_ABORT statements do not include __func__. Would it be better to include them for debugging and error tracing in the future? :)

I'm not sure about this one, that's not a common pattern in the code base,

$ gg GGML_ABORT | grep __func__ | wc -l
37 # including it
$ gg GGML_ABORT | grep -v __func__ | wc -l
446 # not including it

and when I hit an abort, I get automatically get stack trace:

ggml/src/ggml-virtgpu/virtgpu-common.cpp:19: calling abort
bin/libggml-base.so.0(+0x75ce) [0x720ce69285ce]
bin/libggml-base.so.0(ggml_print_backtrace+0x25f) [0x720ce692884c]
bin/libggml-base.so.0(ggml_abort+0x160) [0x720ce6928a1b]
libggml-virtgpu.so.0(+0x8be5) [0x720ce6a14be5]
libggml-virtgpu.so.0(+0x8c08) [0x720ce6a14c08]

so I don't think it's necessary, do you?

@taronaeo
Copy link
Collaborator

taronaeo commented Jan 30, 2026

so I don't think it's necessary, do you?

I usually do it for bug reporting purposes so it's easier to identify which line of code is failing, and the order of sequence it happened. But it's just a suggestion, feel free to ignore it if we aren't expecting X specific lines of GGML_ABORT to hit/fail often haha

Edit: Also another thought. Most users are end-consumers who do not compile from source and rather, use a pre-built release binary. IIRC, I may be wrong, release builds do not show full backtrace information on the failing lines of code leading to the abort, or may have just been optimized out; much harder to debug.

@kpouget
Copy link
Contributor Author

kpouget commented Jan 30, 2026

Edit: Also another thought. Most users are end-consumers who do not compile from source and rather, use a pre-built release binary. IIRC, I may be wrong, release builds do not show full backtrace information on the failing lines of code leading to the abort, or may have just been optimized out; much harder to debug.

good point, I missed that
but I was thinking about it as well, and I think what matters is that the message is unique, so that you can locate it easily when you get user log trace.

anyway, I'll think about it, I want to improve the error message when running in an unsupported environment (no virtgpu, this one should be good, but unpatched virglrenderer, this one can be improved I guess)

@kpouget
Copy link
Contributor Author

kpouget commented Feb 3, 2026

I've updated the code with the cleaner logging, and I finally followed your suggestion with __func__. ABORT/ERROR/WARNING include now it (not INFO, on purpose)
and they all include the ggml-virtgpu: prefix for clarity. Makes the logging more readable.

I also reworked the abort on init, so that it doesn't abort if the virtgpu isn't detected

Copy link
Collaborator

@taronaeo taronaeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor formatting changes :)

@kpouget
Copy link
Contributor Author

kpouget commented Feb 3, 2026

good catches, thanks, fixed as suggested :)

@taronaeo
Copy link
Collaborator

taronaeo commented Feb 3, 2026

Great! Wait for CI to go green and we merge.

@taronaeo
Copy link
Collaborator

taronaeo commented Feb 4, 2026

CI / ggml-ci-x64-nvidia-vulkan-cm (pull_request) failure is unrelated to this PR. Merging.

@taronaeo taronaeo merged commit 015deb9 into ggml-org:master Feb 4, 2026
79 of 80 checks passed
agent-enemy-2 pushed a commit to agent-enemy-2/llama.cpp that referenced this pull request Feb 4, 2026
* ggml-virtgpu: regenerate_remoting.py: add the ability to deprecate a function

* ggml-virtgpu: deprecate buffer_type is_host remoting

not necessary

* ggml-virtgpu: stop using static vars as cache

The static init isn't thread safe.

* ggml-virtgpu: protect the use of the shared memory to transfer data

* ggml-virtgpu: make the remote calls thread-safe

* ggml-virtgpu: backend: don't continue if couldn't allocate the tensor memory

* ggml-virtgpu: add a cleanup function for consistency

* ggml-virtgpu: backend: don't crash if buft->iface.get_max_size is missing

* fix style and ordering

* Remove the static variable in apir_device_get_count

* ggml-virtgpu: improve the logging

* fix review minor formatting changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants